Create an interactive Parallel Plot

To demonstrate the use of the interactive parallel plot, we use a project already loaded into the CKG database.

[1]:
import pandas as pd

from ckg.report_manager import project, dataset, report
from ckg.analytics_core.viz import viz as plots

import networkx as nx
from networkx.readwrite import json_graph

from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go

from scipy.stats import zscore
init_notebook_mode(connected=True)
%matplotlib inline

import ipywidgets as widgets
from ipywidgets import interact, interact_manual
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning:

The package pingouin is out of date. Your version is 0.3.11, the latest is 0.3.12.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.

WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.

We create a new project object and load the respective data and report

[2]:
my_project = project.Project(identifier='P0000001', datasets={}, report={})
my_project.load_project_data()
my_project.load_project_report()

We can now access to all the results for each data type

[3]:
my_project.list_datasets()
[3]:
dict_keys(['clinical', 'multiomics', 'proteomics'])

We will use the results from the proteomics analyses. We access the dataset ‘proteomics’ for further analysis

[4]:
proteomics_dataset = my_project.get_dataset('proteomics')

The available analysis for this dataset are:

[5]:
my_project.get_dataset('proteomics').list_dataframes()
[5]:
['complex_associations',
 'correlation_correlation',
 'disease_associations',
 'drug_associations',
 'go annotation',
 'go_enrichment_Biological_processes_regulation_enrichment',
 'interaction_network',
 'literature_associations_publications_abstracts',
 'number of modified proteins',
 'number of peptides',
 'number of proteins',
 'original',
 'overview statistics_summary',
 'pathway annotation',
 'pathway_enrichment_Pathways_regulation_enrichment',
 'processed',
 'protein biomarkers',
 'regulated',
 'regulation table',
 'tissue qcmarkers']

We can access the different dataframes like this:

[6]:
my_project.get_dataset('proteomics').get_dataframe('go annotation')
[6]:
annotation group identifier source
0 mitochondrial genome maintenance None TYMP~P19971 UniProt
1 maltose metabolic process None MGAM~O43451 UniProt
2 maltose metabolic process None GAA~P10253 UniProt
3 ribosomal large subunit assembly None RPL11~P62913 UniProt
4 ribosomal large subunit assembly None RPL6~Q02878 UniProt
5 ribosomal large subunit assembly None RPL3~P39023 UniProt
6 ribosomal large subunit assembly None RPLP0~P05388 UniProt
7 ribosomal small subunit assembly None RPS28~P62857 UniProt
8 ribosomal small subunit assembly None RPS5~P46782 UniProt
9 ribosomal small subunit assembly None RPS14~P62263 UniProt
10 ribosomal small subunit assembly None RPS19~P39019 UniProt
11 ribosomal small subunit assembly None RPS27~P42677 UniProt
12 very long-chain fatty acid metabolic process None ACAA1~P09110 UniProt
13 autophagosome assembly None RAB1A~P62820 UniProt
14 autophagosome assembly None NSFL1C~Q9UNZ2 UniProt
15 autophagosome assembly None UBQLN1~Q9UMX0 UniProt
16 autophagosome assembly None RAB7A~P51149 UniProt
17 urea cycle None ASS1~P00966 UniProt
18 urea cycle None CPS1~P31327 UniProt
19 urea cycle None OTC~P00480 UniProt
20 urea cycle None ARG1~P05089 UniProt
21 urea cycle None ASL~P04424 UniProt
22 citrulline metabolic process None ASS1~P00966 UniProt
23 argininosuccinate metabolic process None ASS1~P00966 UniProt
24 ribosomal subunit export from nucleus None RAN~P62826 UniProt
25 ribosomal subunit export from nucleus None EIF6~P56537 UniProt
26 ribosomal large subunit export from nucleus None RAN~P62826 UniProt
27 ribosomal large subunit export from nucleus None NPM1~P06748 UniProt
28 ribosomal small subunit export from nucleus None NPM1~P06748 UniProt
29 ribosomal small subunit export from nucleus None RAN~P62826 UniProt
... ... ... ... ...
17753 negative regulation of extrinsic apoptotic sig... None SCG2~P13521 UniProt
17754 negative regulation of extrinsic apoptotic sig... None GSTP1~P09211 UniProt
17755 negative regulation of extrinsic apoptotic sig... None LMNA~P02545 UniProt
17756 negative regulation of extrinsic apoptotic sig... background THBS1~P07996 UniProt
17757 positive regulation of extrinsic apoptotic sig... None PTPRC~P08575 UniProt
17758 positive regulation of extrinsic apoptotic sig... background AGT~P01019 UniProt
17759 positive regulation of extrinsic apoptotic sig... None BID~P55957 UniProt
17760 positive regulation of extrinsic apoptotic sig... background PDIA3~P30101 UniProt
17761 positive regulation of extrinsic apoptotic sig... None PAK2~Q13177 UniProt
17762 positive regulation of extrinsic apoptotic sig... None PYCARD~Q9ULZ3 UniProt
17763 regulation of extrinsic apoptotic signaling pa... None FGFR1~P11362 UniProt
17764 negative regulation of extrinsic apoptotic sig... background PRDX2~P32119 UniProt
17765 negative regulation of extrinsic apoptotic sig... None COL2A1~P02458 UniProt
17766 positive regulation of extrinsic apoptotic sig... None PPP1CA~P62136 UniProt
17767 regulation of intrinsic apoptotic signaling pa... None PYCARD~Q9ULZ3 UniProt
17768 negative regulation of intrinsic apoptotic sig... None DDX3X~O00571 UniProt
17769 positive regulation of intrinsic apoptotic sig... background S100A8~P05109 UniProt
17770 positive regulation of intrinsic apoptotic sig... None BID~P55957 UniProt
17771 positive regulation of intrinsic apoptotic sig... None SLC9A3R1~O14745 UniProt
17772 positive regulation of intrinsic apoptotic sig... background S100A9~P06702 UniProt
17773 regulation of phosphatidylcholine biosynthetic... None FABP3~P05413 UniProt
17774 regulation of store-operated calcium entry None CD84~Q9UIB8 UniProt
17775 regulation of store-operated calcium entry None STC2~O76061 UniProt
17776 regulation of store-operated calcium entry None STIM1~Q13586 UniProt
17777 positive regulation of cation channel activity None CTSS~P25774 UniProt
17778 regulation of semaphorin-plexin signaling pathway background NCAM1~P13591 UniProt
17779 negative regulation of cysteine-type endopepti... None PARK7~Q99497 UniProt
17780 positive regulation of cysteine-type endopepti... background GSN~P06396 UniProt
17781 positive regulation of cysteine-type endopepti... None FAS~P25445 UniProt
17782 negative regulation of cysteine-type endopepti... None PAK2~Q13177 UniProt

17783 rows × 4 columns

In this case, we will use the the processed dataframe with transformed and imputed LFQ intensities. We then normalize the data using Z Score.

[7]:
proteomics_dataset = my_project.get_dataset('proteomics')
processed_df = proteomics_dataset.get_dataframe('processed')
[8]:
processed_df.head()
[8]:
A2M~P01023 A30~A2MYE2 ABI3BP~Q7Z7G0 ACE~P12821 ACTB~P60709 ACTN1~P12814 ADA2~Q9NZK5 ADAMTS13~Q76LX8 ADAMTSL4~Q6UY14 ADH4~P08319 ... VIM~P08670 VK3~A2N2F4 VNN1~O95497 VTN~P04004 VWF~P04275 YWHAZ~P63104 group sample scFv~Q65ZC9 subject
0 38.005564 28.173504 21.588427 22.213865 27.090330 25.039968 23.442151 24.010605 25.085820 23.389032 ... 24.178889 25.835908 22.480055 32.815815 28.922779 19.246215 Cirrhosis AS1181 27.788928 S368
1 37.309118 27.981907 27.342062 23.847270 27.461155 25.896268 23.754503 24.135818 19.241174 22.148706 ... 23.709777 25.004889 23.852908 32.722121 29.881279 22.141285 Cirrhosis AS1182 26.869972 S369
2 37.384952 28.857627 20.156993 22.863630 27.929764 24.295225 23.359443 24.121788 24.923476 23.017163 ... 23.599064 26.271650 24.232132 32.755752 29.444625 18.901149 Cirrhosis AS1184 28.069328 S371
3 38.417225 28.978380 25.501910 22.992774 27.152479 25.231288 23.701340 24.568309 24.878802 26.388112 ... 24.179076 25.929200 24.269047 32.714014 29.397176 22.216971 Cirrhosis AS1185 28.170209 S372
4 37.471303 28.748744 20.658038 21.949025 27.537048 22.392992 22.406264 24.961173 22.246468 24.339540 ... 23.865224 26.701340 20.490667 32.722691 28.540895 20.797497 Cirrhosis AS1186 28.612280 S373

5 rows × 517 columns

[9]:
processed_df = processed_df.drop(['sample', 'subject'], axis=1).set_index('group').apply(zscore).reset_index()

In order to find clusters of proteins, we access the report and the protein-protein correlation network as a dictionary.

[10]:
proteomics_report = my_project.get_dataset('proteomics').report
proteomics_report.list_plots()
[10]:
dict_keys(['0_date', '0~proteomics_pipeline~cytoscape_network', '10~regulation_description~description', '11~regulation_anova~basicTable', '12~regulation_anova~volcanoplot', '13~correlation_correlation~network', '14~interaction_network~network', '15~complex_associations~basicTable', '16~drug_associations~basicTable', '17~disease_associations~basicTable', '18~literature_associations_publications_abstracts~basicTable', '19~literature_associations_publications_abstracts~wordcloud', '1~overview statistics_summary~multiTable', '20~go_enrichment_Biological_processes_regulation_enrichment~basicTable', '21~pathway_enrichment_Pathways_regulation_enrichment~basicTable', '2~proteins~barplot', '3~proteins~basicTable', '4~coefficient_variation_coefficient_of_variation~scatterplot_matrix', '5~quality_control_qcmarkers~qcmarkers_boxplot', '6~ranking_ranking_with_markers~ranking', '7~ranking_ranking_with_markers~basicTable', '8~stratification_description~description', '9~stratification_pca~pca'])
[14]:
correlation_net_dict = proteomics_report.get_plot('13~correlation_correlation~network')[0]

To convert the dictionary into a network, we access the json version within the dictionary and convert it using the networkX package.

[15]:
correlation_net = json_graph.node_link_graph(correlation_net_dict['net_json'])

Now that we have a network with proteins colored by cluster, we can convert this information into a dataframe to be used in this Jupyter Notebook.

[16]:
correlation_df = pd.DataFrame.from_dict(correlation_net.nodes(data=True))
correlation_df = correlation_df[0].to_frame().join(correlation_df[1].apply(pd.Series))
[17]:
correlation_df.columns = ['identifier', 'degree', 'radius', 'color', 'cluster']

Since the correlation network was generated using cut-off , not all the proteins in the processed dataframe are part of a cluster, therefore we filter the processed dataframe and keep only the proteins that are present in the correlation clusters.

[18]:
min_val = processed_df._get_numeric_data().min().min().round()
max_val = processed_df._get_numeric_data().max().max().round()
processed_df = processed_df[list(correlation_df.identifier) + ['group']]

Ready! To build the parallel plot, we create a dictionary with the clusters and respectives colors, and filter the processed dataframe to include only the proteins in a specific cluster.

Using the Jupyter Widgets interact function, we can make the plot interactive and allow the visualization of a cluster selected by the user.

[19]:
from IPython.core.display import display, HTML
[20]:
@interact
def plot_parallel_plot(cluster=correlation_df.cluster.unique()):
    cluster_colors = dict(zip(correlation_df.cluster, correlation_df.color))
    clusters = correlation_df.groupby('cluster')
    identifiers = clusters.get_group(cluster)['identifier'].tolist()
    title= "Parallel plot cluster: {}".format(cluster)
    df = processed_df.set_index('group')[identifiers].reset_index()
    figure = plots.get_parallel_plot(df, identifier=cluster, args={'color':cluster_colors[cluster],'group':'group',
                                                                          'title':title,
                                                                          'zscore':False})
    display(HTML("<p>{}</p>".format(",".join(identifiers))))
    iplot(figure.figure)
[ ]: